The reliability of digital audio is increasingly a critical aspect in digital forensic analysis, legal proceedings, digital media authentication, and digital communication services. Among all forms of digital audio forgeries, copy-move forgeries are a critical challenge in digital audio analysis because of their similarity to the original and copied segments of digital audio. Since the copy-move forgeries are created by duplicating a segment of digital audio, they possess identical speaker characteristics, background noises, and environmental sounds. This similarity makes it difficult to identify copy- move forgeries using conventional methods. This paper proposes a novel \"Hybrid Spectro-Spiral Graph Attention Framework\" to identify copy- move forgeries in digital audio with improved computational efficiency. The proposed system incorporates a novel \"adaptive spectral attention mechanism\" for identifying suspicious regions in digital audio, a \"Differential Evolution optimization method\" for selecting frequency bands, a \"Spiral-based graph encoding method\" for structural representation of spectral features, and a \"multi-head Graph Attention Network\" for relational feature learning. The proposed system is a novel improvement over conventional methods like \"keypoint matching\" and \"swarm optimization\". Experimental evaluation of the framework under various compression levels and additive noise conditions shows better accuracy in terms of detection and reduced complexity in terms of runtime compared to the existing PSO-based systems. The framework shows better generalization performance and is computationally feasible for CPU-based implementation in digital forensic systems. The proposed system is helpful in ensuring audio authenticity and strengthening digital forensic systems.
Introduction
With the rise of digital audio technologies, audio content is widely used in legal, investigative, surveillance, and social media contexts. However, malicious audio manipulations, particularly copy-move forgery—where segments of audio are duplicated and inserted elsewhere in the same recording—pose significant challenges. Detecting such forgeries is difficult because the duplicated segments share identical spectral and noise characteristics with the original content.
Existing Detection Methods:
Spectrogram-based Analysis: Short-time Fourier Transform (STFT) converts audio into a time-frequency representation.
Keypoint Detection: Techniques like Scale-Invariant Feature Transform (SIFT) identify unique points in the spectrogram for comparison.
Optimization Techniques:Particle Swarm Optimization (PSO) locates dense matching regions in spectrograms.
Graph-based Deep Learning: Spectrogram patches are encoded into graphs (e.g., spiral pattern encoding) and classified with Convolutional Neural Networks (CNNs).
Other approaches include cochleagram analysis with SSIM, deep learning for splicing/deepfake detection, MFCC feature-based classifiers, and multi-input CNNs for audio feature extraction.
Limitations of Existing Methods:
High computational cost due to exhaustive keypoint comparisons.
Slow convergence of optimization algorithms like PSO.
Sensitivity to post-processing operations: compression, filtering, noise.
Inefficient modeling of structural relationships between spectral components.
Difficulty detecting short duplicated segments within long audio recordings.
Problem Statement:
Copy-move audio forgery is particularly challenging because duplicated segments are acoustically coherent with the original, unlike splicing attacks. Existing systems struggle with:
Robust detection of short or subtle duplicates.
Resistance to compression, noise, and other distortions.
Efficient computation for long recordings.
Capturing contextual and structural dependencies in spectral features.
Proposed Solution:
The paper proposes a Hybrid Spectro-Spiral Graph Attention Framework for audio copy-move forgery detection, which combines:
Spectral Attention: Focuses on important frequency regions in the audio signal.
Spiral Graph Encoding: Converts spectrogram patches into graph representations that preserve structural relationships.
Graph-based Deep Learning: Learns hierarchical relationships between spectral components to improve detection accuracy.
Evolutionary Optimization: Enhances the efficiency of matching duplicated audio segments.
Conclusion
In the present work, three different approaches, namely GA-SVM, CQT Spectral with Random Forest, and the Hybrid Graph Attention (MLP) model, have been implemented for the purpose of detecting forged audio. From the results of the experiments, it is found that the Hybrid Graph Attention model performs better than the other two approaches, yielding a maximum accuracy of 95.64%. Further, the results of the experiments using the GA-SVM model yield a maximum accuracy of 92.13%, whereas the results of the experiments using the Random Forest (CQT Spectral) model yield a maximum accuracy of 89.34%. Thus, the proposed hybrid model for the purpose of detecting forged audio yields better results compared to the results of the other machine learning models.
References
[1] B. Ustubioglu, G. Tahaoglu, A. Ustubioglu, G. Ulutas, and M. Kilic, “A novel audio copy-move forgery detection method with classification of graph- based representations,” IEEE Access, vol. 13, pp. 22029–22054, 2025.
[2] B. Ustubioglu, “An attack-independent audio forgery detection technique based on cochleagram images of segments with dynamic threshold,” IEEE Access, vol. 12, pp. 82660–82675, 2024.
[3] D. U. Leonzio, L. Cuccovillo, P. Bestagini, M. Marcon, P. Aichroth, and S. Tubaro, “Audio splicing detection and localization based on acquisition device traces,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4157–4172, 2023.
[4] I. Kim, T.-P. Doan, S. Hong, and S. Jung, “Inference-time noise addition for improving adversarial robustness of audio deepfake detection systems,” IEEE Access, vol. 13, pp. 200669–200682, 2025.
[5] D. Song, N. Lee, J. Kim, and E. Choi, “Anomaly detection of deepfake audio based on real audio using generative adversarial networks,” IEEE Access, vol. 12, pp. 184311–184326, 2024.
[6] O. A. Shaaban, R. Yildirim, and A. A. Alguttar,
[7] “Audio deepfake approaches,” IEEE Access, vol. 11,
[8] pp. 132652–132682, 2023.
[9] A. Hamza et al., “Deepfake audio detection via MFCC features using machine learning,” IEEE Access, vol. 10, pp. 134018–134028, 2022.
[10] H. Mandalapu et al., “Multilingual audio-visual smartphone dataset and evaluation,” IEEE Access, vol. 9, pp. 153240–153257, 2021.
[11] M. Talha, H. Ghafoor, and S. Y. Nam, “A unified approach to voice classification leveraging spectrograms, Mel spectrograms and statistical features,” IEEE Access, vol. 13, pp. 133827–133836, 2025.
[12] D. Nurdiyah et al., “Deep semantic feature extraction to overcome overlapping frequencies for instrument recognition in Indonesian traditional music orchestras,” IEEE Access, vol. 12, pp. 76936– 76954, 2024.
[13] Y. Kawaguchi and T. Endo, “How can we detect anomalies from subsampled audio signals?” in Proc. IEEE MLSP, 2017.
[14] F. Wang, C. Li, and L. Tian, “Detecting audio copy-move forgery based on DCT and SVD,” in Proc. IEEE ICCT, 2017.
[15] X. Huang, Z. Liu, W. Lu, H. Liu, and S. Xiang, “Fast and effective copy-move detection of digital audio based on auto segment,” 2020.
[16] R. Yang, Z. Qu, and J. Huang, “Detecting digital audio forgeries by checking frame offsets,” in Proc. ACM Multimedia Security Workshop, 2008.
[17] Z. Ye et al., “FlashSpeech: Efficient zero-shot speech synthesis,” Proc. ACM Multimedia, 2024.
[18] H. Tak et al., “End-to-end spectro-temporal graph attention networks for speech deepfake detection,” 2021.
[19] M. Aldwairi and A. Alwahedi, “Detecting fake news in social media networks,” Procedia Computer Science, vol. 141, pp. 215–222, 2018.
[20] F. Zheng, G. Zhang, and Z. Song, “Comparison of different implementations of MFCC,” Journal of Computer Science and Technology, vol. 16, no. 6, pp. 582–589, 2001.
[21] Y. Zhong and B. Huang, “Classification of cassava leaf disease using transformer-embedded ResNet,” Agriculture, vol. 12, no. 9, 2022.
[22] X. Zhou and R. Zafarani, “A survey of fake news: Fundamental theories and detection methods,” ACM Computing Surveys, vol. 53, no. 5, 2021.
[23] S. Suwajanakorn, S. Seitz, and I. Kemelmacher- Shlizerman, “Synthesizing Obama,” ACM Transactions on Graphics, vol. 36, no. 4, 2017.
[24] A. R. Javed et al., “A comprehensive survey on computer forensics,” IEEE Access, vol. 10, pp. 11065–11089, 2022.
[25] A. Werning and R. Haeb-Umbach, “Dataset pruning for compression of audio tagging models,” in Proc. EUSIPCO, 2024.
[26] S. Dibbo et al., “Robust audio classification using multi-layer neural networks,” in Proc. ICASSP Workshops, 2024.
[27] S. Mehra, V. Ranga, and R. Agarwal, “Improving speech command recognition through deep learning,” Signal Image and Video Processing, 2024.
[28] A. Ustubioglu et al., “Mel spectrogram-based audio forgery detection using CNN,” Signal Image and Video Processing, 2023.
[29] Y. Kawaguchi and T. Endo, “How can we detect anomalies from subsampled audio signals?” in Proc. IEEE MLSP, 2017.
[30] F. Wang, C. Li, and L. Tian, “Detecting audio copy-move forgery based on DCT and SVD,” in Proc. IEEE ICCT, 2017.
[31] X. Huang, Z. Liu, W. Lu, H. Liu, and S. Xiang, “Fast and effective copy-move detection of digital audio based on auto segment,” 2020.
[32] R. Yang, Z. Qu, and J. Huang, “Detecting digital audio forgeries by checking frame offsets,” in Proc. ACM Multimedia Security Workshop, 2008